How do we count? The Problem of Tagging Phrasal Verbs in Parts
نویسنده
چکیده
This paper examines the current performance of the stochastic tagger PARTS (Church 88) in handling phrasal verbs, describes a problem that arises from the statistical model used, and suggests a way to improve the tagger's performance. The solution involves a change in the definition of what counts as a word for the purpose of tagging phrasal verbs. 1. I N T R O D U C T I O N Statistical taggers are commonly used to preprocess natural language. Operations like parsing, information retrieval, machine translation, and so on, are facilitated by having as input a text tagged with a part of speech label for each lexical item. In order to be useful, a tagger must be accurate as well as efficient. The claim among researchers advocating the use of statistics for NLP (e.g. Marcus et al. 92) is that taggers are routinely correct about 95% of the time. The 5% error rate is not perceived as a problem mainly because human taggers disagree or make mistakes at approximately the same rate. On the other hand, even a 5% error rate can cause a much higher rate of mistakes later in processing if the mistake falls on a key element that is crucial to the correct analysis of the whole sentence. One example is the phrasal verb construction (e.g. gun down, back off). An error in tagging this two element sequence will cause the analysis of the entire sentence to be faulty. An analysis of the errors made by the stochastic tagger PARTS (Church 88) reveals that phrasal verbs do indeed constitute a problem for the model. 2. P H R A S A L V E R B S The basic assumption underlying the stochastic process is the notion of independence. Words are defined as units separated by spaces and then undergo statistical approximations. As a result the elements of a phrasal verb are treated as two individual words, each with its own lexical probability (i.e. the probability of observing part of speech i given word j). An interesting pattern emerges when we examine the errors involving phrasal verbs. A phrasal verb such as sum up will be tagged by PARTS as noun + preposition instead of verb + particle. This error influences the tagging of other words in the sentence as well. One typical error is found in infinitive constructions, where a phrase like to gun down is tagged as INTO NOUN IN (a prepositional ' to ' followed by a noun followed by another preposition). Words like gun, back, and sum, in isolation, have a very high probability of being nouns a.s opposed to verbs, which results in the misclassification described above. However, when these words are followed by a particle, they are usually verbs, and in the infinitive construction, always verbs. 2.1. T H E H Y P O T H E S I S Tile error appears to follow froln the operation of the stochastic process itself. In a tr igram model the probability of each word is calculated by taking into consideration two elements: the lexical probability (probability of the word bearing a certain tag) and the contextual probability (probability of a word bearing a certain tag given two previous parts of speech). As a result, if an element has a very high lexical probability of being a noun (gun is a noun in 99 out of 102 occurrences in the Brown Corpus), it will not only influence but will actually override the contextual probability, which might suggest a different assignment. In the case of to gun down the ambiguity of to is enhanced by the ambiguity of gun, and a mistake in tagging gun will automatically lead to an incorrect tagging of to as a preposition. It follows that the tagger should perform poorly on
منابع مشابه
استفاده از تجزیه گرهای احتمالاتی زبان طبیعی جهت بهبود ترجمه افعال گروهی انگلیسی به فارسی
Machine translation of English sentences faces a big problem when it deals with phrasal verbs. Phrasal verb is a common structure occurring in English as a combination of a verb and a preposition, a verb and an adverb, or a verb with both an adverb and a preposition. Meaning of a phrasal verb is not compositional. The second part of the phrasal verbs which often is a preposition is called parti...
متن کاملThe Effect of Conceptual Metaphor Awareness on Learning Phrasal Verbs by Iranian Intermediate EFL Learners
The ability to comprehend and produce phrasal verbs, as lexical chunks or groups of words which are commonly found together, is an important part of language learning. This study investigates the effect of ‘conceptual metaphor awareness’, as a newly developed technique in Cognitive Linguistics, on learning phrasal verbs by Iranian intermediate EFL learners. To meet this objective, two intact ho...
متن کاملThe Comparative Effect of Visual vs. Auditory Input Enhancement on Learning Non-Congruent Phrasal Verbs by Iranian EFL Learners
Vocabulary is one of the essential components of language and learning phrasal verbs as part of vocabulary is quite challenging for foreign language learners. The present study aimed at investigating the effects of visual and auditory input enhancement on learning non-congruent phrasal verbs. The participants of the study were 90 intermediate English language learners who were divided into two ...
متن کاملConcordance-Based Data-Driven Learning Activities and Learning English Phrasal Verbs in EFL Classrooms
In spite of the highly beneficial applications of corpus linguistics in language pedagogy, it has not found its way into mainstream EFL. The major reasons seem to be the teachers’ lack of training and the unavailability of resources, especially computers in language classes. Phrasal verbs have been shown to be a problematic area of learning English as a foreign language due to their semantic op...
متن کاملEFL Learners’ Incidental Acquisition of English Phrasal Verbs through Enhanced Extensive Reading vs. Unenhanced Extensive Reading
Abstract Acquiring vocabulary, as a significant and challenging part of language learning process, has always been consistent with reading. In this study, the researchers examined the effect of enhanced extensive reading on EFL learners’ incidental acquisition of English vocabulary, with specific focus on phrasal verbs. Twenty five homogenized participants were selected and divided into two gro...
متن کامل